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BAYESIAN METHODS IN THE SHAPE INVARIANT 
MODEL (I): POSTERIOR CONTRACTION RATES ON 
PROBABILITY MEASURES 

By Dominique Bontemps* and Sebastien Gadat* 
Institut Mathematiques de Toulouse, Universite Paul Sabatier 

In this paper, we consider the so-called Shape Invariant Model 
which stands for the estimation of a function /" submitted to a ran- 
dom translation of law in a white noise model. We are interested 
in such a model when the law of the deformations is unknown. We 
aim to recover the law of the process Pjo gO. 

In this perspective, we adopt a Bayesian point of view and find 
prior on / and g such that the posterior distribution concentrates at a 
polynomial rate around P/o_gO when n goes to -l-oo. We intensively use 
some Bayesian non parametric tools coupled with mixture models and 
believe that some of our results obtained on this mixture framework 
may be also of interest for frequentist point of view. 

1. Introduction. We are interested in this work in the so-called Shape 
Invariant Model (SIM). Such model aims to describe a statistical process 
which involves a deformation of a functional shape according to some ran- 
domized geometric variability. Such geometric deformation of a common un- 
known shape may be well-suited in various and numerous fields, like image 
processing (see for instance [AGP91] or [PMRCIO]). It corresponds to a par- 
ticular case of the general Grenander's theory of shapes (see [GM07] for a 
detailed introduction on this topic). This kind of model is also useful in 
medicine: the recent work of [Bigll] deals with the differentiation between 
normal and arrhythmic cycles in electrocardiogram. It appears in genetics 
if one deals with some delayed activation curves of genes when drugs are 
administrated to patients, or in Chip-Seq estimation when translations in 
protein fixation yield randomly shifted counting processes (see for instance 
[MMW07] and [BGKM12]). It also occurs in econometric for the analysis of 
Engel curves [BCK07], in landmark registration [Big06]. . . 

Such a model has received a large interest in the statistical community 
as pointed by the large amount of references on this subject. Some works 
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consider a semi-parametric approach for the estimation {self-modeling re- 
gression framework used by [KG88] and [BGV09]). In [Casl2], the author 
apphes some Bayesian techniques to obtain also statistical results on SIM in 
a semi-parametric setting when the level of noise on observations asymptot- 
ically vanishes. Older approaches use parametric settings (see [GMOl] and 
the discussion therein for an overview) and study the so-called Frechet mean 
of pattern. Standard M-estimation or Bayesian methods are exploited in 
[BGL09] or [AAT07] and same authors develop in [AKTIO] a nice stochas- 
tic algorithm to run estimation in such a model. Some recent works follow 
some testing strategies to obtain curve registration [CDll], [Coll2]. At last, 
note that [BGIO] obtains some minimax adaptive results for non-parametric 
estimations in the Shape Invariant Model when one knows the law of the 
randomized translations. 

All these works are interested in the statistical process of deformation of 
the "mean common shape" and generally aim to recover this unknown func- 
tional object according to noisy i.i.d. observations. Moreover, the Shape In- 
variant Model is considered as a standard benchmark for statistical methods 
which aim to compute estimations in some more general deformable models. 
Of course, the SIM could be extended to some more general situations of ge- 
ometrical deformations described through an action of a finite dimensional 
Lie Group (see [BCG12] for a precise non parametric description). We have 
decided to restrict our work here to the simplest case of the one dimensional 
Lie group of translation to warp the functional objects. 

This work has been inspired by several discussions with Alain Trouve 
about the work [AKTIO] for the study of the Shape Invariant Model. We 
aim to extend their parametric Bayesian framework to the non-parametric 
setting and then study the behaviour of some posterior distributions. Hence, 
the motivation of the paper is mainly theoretical: we want to describe the 
asymptotic evolution of the posterior probability distributions when data 
are coming from the SIM. Of course, we need to build suitable prior which 
yield nice contraction rate for this posterior distribution. We have decided to 
consider the general case where both the functional shape and the probability 
distribution of the deformations are unknown. Indeed, it corresponds to the 
more realistic case. From the best of our knowledge, no sharp statistical 
results have been derived yet in this non-parametric situation. 

Our work will describe the evolution of the posterior distribution when 
the number of observations grows to -|-oo with a fixed noise level a. It is an 
important difference with the study of the asymptotically vanishing noise 
situation (cr — )• 0). It is itself a special feature of the Shape Invariant Model: 
there is no obvious Le Cam equivalence of experiments (see [LCYOO]) for the 
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SIM between the experiments when n i— t- +oo and when cj i—?- 0. It is illus- 
trated by the very different minimax results obtained in [BGIO] (n i— )• +00) 
and in [BG12] (a 1— ?• 0). We will use in the sequel quite standard Bayesian 
non parametric methods to obtain the frequentist consistency and some con- 
traction rates of the Bayesian procedures. Such tools rely on some important 
contributions of [BSW99] and [GGvdVOO] for the posterior behaviour in gen- 
eral situations, as well as Bayesian properties on mixture models stated in 
[GvdVOl] and [GWOO]. 

The paper is organised as follows. Section 2 presents a sharp description 
of the Shape Invariant Model (shortened as SIM in the sequel), as well as 
standard elements on Bayesian and Fourier analysis. It also provides some 
notations for mixture models. It ends with the statement of the posterior 
contraction around the true law on functional curves, which is our main 
result. Section 3 provides a metric description of the important probability 
spaces of the model. At last. Section 4 presents the proof of this main result. 
We end the paper with numerous challenging issues. 

We gather in the appendix sections some technical points: the metric de- 
scription of the Shape Invariant Model embedded in a special randomized 
curves space and the calibration of suitable priors for the SIM. 

2. Model, notations and main results. 

2.1. Statistical settings. 

Shape Invariant Model. We recall here the random Shape Invariant Model. 
We assume to be a function which belongs to a subset of smooth 
functions. We also consider a probability measure which is an element 
of the set 9K([0, 1]). This last set stands for the set of probability measures 
on [0,1]. We observe n realizations of noisy and randomly shifted complex 
valued curves Yi , . . . , coming from the following white noise model 

(2.1) V2;G[0, 1] Vj = l...n dYj{x) := f{x - Tj)dx + adWj{x). 

Here, is the mean pattern of the curves Yi, . . . ,Yn although the random 
shifts (Tj)j=i...„ are sampled independently according to the probability mea- 
sure g^. Moreover, {Wj)j=i,,,n are independent complex standard Brownian 
motions on [0, 1] and model the presence of noise in the observations, the 
noise level is kept fixed in our study and is set to 1 for sake of simplicity. 

In the sequel, /""^ will denote the pattern / shifted by r, that is to say 
the function x ^ f{x — t). Complex valued curves are considered here for 
the simplicity of notations. However all our results can be adapted to the 
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simpler case where all curves 1^ 's are real valued. A complex standard Brow- 
nian motion Wt on [0, 1] is such that Wi is a standard complex Gaussian 
random variable, whose distribution is denoted by A/'c(0, 1); a standard com- 
plex Gaussian random variable have independent real and imaginary parts 
with a real centered Gaussian distribution of variance 1/2. 

This work will address the question of the behaviour of some posterior 
distributions on J-" ® 9?t([0, 1]) given some functional n-sample {Yi, . . . , 1^). 
Since our work will be mainly asymptotic with n — )■ +oo, we intensively 
use some standard notation such as "<" which refers to an inequality up 
to a multiplicative absolute constant. In the meantime, a ~ 6 stands for 
a/b — > 1. 

Bayesian framework. Since most of statistical works on the SIM are fre- 
quentists, we have decided to briefly recall here the Bayesian formalism fol- 
lowing the presentation of [GGvdVOO]. Familiar readers can thus omit this 
paragraph. 

Functional objects and we are looking for, belong to J- 9Jt([0, 1]) 
and for any couple {f,g) € 5!Jt([0, 1]), equation (2.1) describes the law 
of one continuous curve. Its law is denoted Fj^g and possesses a density pf g 
with respect to the Wiener measure on the sample space. Since and 
are unknown, P/o^^o is also unavailable but belongs to a set V of probability 
measure over the sample space. This set V is the set of all possible measures 
described by (2.1) when {f,g) varies into (8) 5Jt([0, 1]). 

Given some prior distribution n„ on V (generally defined through a prior 
on J^(8)9Jt([0, 1])), Bayesian procedures are generally built using the posterior 
distribution defined by 

' "^"/pn-=iP(^.)'in„(p)' 

which is a random measure on V that depends on the observations Yi, . . . ,Yn- 
For instance, Bayesian estimators can be obtained using the mode, the mean 
or the median of the posterior distribution. This is exactly the approach 
adopted by [AKTIO] which is mainly dedicated to compute such a posterior 
mean in a parametric setting with a stochastic EM algorithm. 

The posterior distribution is then said consistent if it concentrates to 
arbitrarily small neighbourhoods of P/o^gO in V with a probability tending 
to 1 when n grows to +oo. One frequentist property of such a posterior 
distribution describes the contraction rate of such neighbourhoods meanwhile 
still capturing most of posterior mass. According to equation (2.1), we thus 
tackle such a Bayesian consistency and compute such convergence rates in 
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the frequentist paradigm. Of course, these properties will highly depend on 
the metric structure of the sets V and J^. 

Functional setting and Fourier analysis. Without loss of generality, the 
function is assumed to be periodic with period 1 and to belong to a 
subset J" of L2([0,1]), the space of squared integrable functions on [0, 1] en- 
dowed with the euclidean norm \\h\\ := \h{s)\'^ds. Moreover, each element 
h € L^([0,1]) may naturally be extended to a periodic function on M of 
period 1. Since we will intensively use some Fourier analysis in the sequel, 
let us first recall some notations: i will stand for the complex number such 
that = —1. The Fourier coefficients of h are denoted 

(2.2) Oiih) := C e-^2-^*/i(t)dt. 

All along the paper, we will often use the parametrisation of any element of 
h G ^cd^' -'-]) through its Fourier expansion and will simply use the notation 
{6i)i(iz instead of {6t{h))i^i- 

Our work is dedicated to the analysis of SIM when T models smooth 
functions of [0, 1]. Hence, natural subspaces of L'^{}fi, 1]) are Sobolev spaces 
%a with a smoothness parameter s: 

^.:=(/e4([o,i]) I Y.^i + WnMf)?<+<A- 

In the sequel, we aim to find prior on V that reaches good frequentist prop- 
erties, and if possible adaptive with the smoothness parameter s since this 
parameter is generally unknown. We will consider only some regular cases 
when s > 1, the quantity Xl^^^l^^P thus bounded and we denote the 
Sobolev norm 

It will also be useful to consider in some cases Fourier "thresholded" elements 
of Tis- Hence, we set for any integer i (which is the frequency threshold) 

n':={f€Lli[0,l]) I V|A:|>£ 9k{f) = O} . 

Mixture model. According to equation (2.1), we can write in the Fourier 
domain that 

y£€Z Vi G {1 . . . n} ee{Yj) = ^Og-is^J^^ + (^j, 
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where 6^ := {9^)i^z denotes the true unknown Fourier coefficients of f^. Ow- 
ing to the white noise model, the variables are independent standard 
(complex) Gaussian random variables: ^i,i.d. -^ci^i 1)) J- 

For sake of simplicity, 7 will refer to j{z) := vr^^e"'^'^, Vz E C, the den- 
sity of the standard complex Gaussian centered distribution A/'c(0, 1), and 
7/i(-) '■= li- ~ t^) is the density of the standard complex Gaussian with mean 
^. We keep also the same notation for p dimensional complex Gaussian den- 

II II 2 

sities 7(z) := 7r~Pe~"^" ,\/z G C, where ||2;|| is the euclidean p dimensional 
norm of the complex vector z. 

For any frequence i, equation (2.1) implies that Oi(Y) follows a mixture 
of complex Gaussian standard variables with mean 0^e~'^'^^'^, ip G [0, 1]: 

Jo ^ 

In the sequel, for any phase 99 G [0, 1] sampled according to any distribution 
51, and for any Q G ^^(Z), Q * ^ will denote the element of £^(Z) given by 

V£ G Z (0 • := e^e-'^^^'^. 

When is a complex vector, for instance Q = {9-e, • • • , ^f), we keep the same 
notation 6 • ip := (6'_£e^27rV^ ...,90, die-''^'"^, 6'^e-*2'^^v) to refer to the 
2£ + 1 dimensional vector. It corresponds to a rotation of each coefficient 9i 
around the origin with an angle Inip. According to this notation, the law of 
the infinite series (of Fourier coefficients of Y) can thus be rewritten as 

0{Y)^ [\eo.^{.)dg{v). 
Jo 

One should remark the important fact that from one frequency to another, 
the rotations used to build 9{Y) are not independent, which traduces the 
fact that the coefficients (0^(y))^ are highly correlated. 

2.2. Notations on Mixture models. Our study will intensively use some 
classical tools of mixture models, see for instance the papers of [GvdVOl] or 
[GWOO]. We thus choose to keep some notations already used in such works. 

For any vector 6 G ^c(^) corresponds a function / G i^([0, 1]) according 
to equation (2.2) and for any measure g G 5!Jt([0, 1]), ¥g^g will refer to the 
law of the vector of £^(Z) described by the location mixture of Gaussian 
variables: 

^9,g ■■= / 7e,A.)dgi^p). 
Jo 
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This mixture model is of infinite dimension since e belongs to f{Z). Follow- 
ing an obvious notation shortcut, Fj^g will be its equivalent for the functional 
law on curves derived from Pe,^- When 9 is of finite length k, pe,g will be the 
density with respect to the Lebesgue measure on C^' of the law Pe,^: 

yz G C'^ pe,g{z) := C j{z-e. ^)dg{^). 

Jo 

We also use standard objects such as the Hellinger distance dn between 
probability measures and the Total Variation distance dxvj a-s well as cover- 
ing numbers of metric spaces such as D{e,V,d). These objects are precisely 
described in Appendix A. 

Bayesian frequentist consistency rate. In our setting, d is chosen according 
to one of the metric introduced above (dn or dxv) on the set 

V ■.= {¥fj{f,g) ens0m{[o, 1])}. 

We can now remind Theorem 2.1 of [GGvdVOO] which will be useful for our 
purpose. 

Theorem 2.1 (Posterior consistency and convergence rate, [GGvdVOO]). 
Assume that a sequence {en)n with — >• and ne^ — )• +oo, a constant 
C > 0, and a sequence of sets Vn CV satisfy 

(2.3) log D{en,Vn, d) <nel 

(2.4) n„ {V \ Vn) < e-"^' (^+^) 

(2.5) n„ (P/,, G P|(i^L(P/o,gO,P/,,) < 62,y(P^o,,o,P/,,) < el) > e""^"^. 
Then there exists a sufficiently large M such that 

Un {Ff^g : d{Fjo^gO,¥f,g) > Men\Yi, . . .Yn) 
in P/(\gO probability as n — > +oo. 

The posterior concentration rate obtained in the above result is e.„. The 
growing set Vn is referred to as a Sieve over V. Generally, this rate e„ can 
be compared to the classical frequentist benchmark: for instance [GGvdVOO] 
obtained for the Log Spline model a contraction rate = n~'^^^'^^'^^^ when 
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the unknown underlying density belongs to an Holder class C*([0, 1]), and 
this rate is known to be the optimal one (in the sense that it is the mini- 
max one) in the frequentist paradigm over Holder densities of regularity s 
(see [IH81]). Similarly, the recent work of [RR12] considers the situation of 
density estimation for infinite dimensional exponential families and reaches 
also contraction rates close or equal to the known optimal frequentist one. 

2.3. Bayesian prior and posterior concentration in the randomly shifted 
curves model. We detail here the Bayesian prior n„ on V used to obtain 
a polynomial concentration rate. Note that such prior will be in our work 
independent on the unknown smoothness parameter s. As pointed in the 
paragraph above, it is sufficient to define some prior on the space Tig ® 
9Jt([0, 1]) since equation (2.1) will then transport this prior to a law on 
v. The two parameters / and g are picked independently at random following 
the next prior distributions. 

Prior on f. The prior on / is slightly adapted from [RR12]. It is defined 
on Us through 

vr := ^ X{£)7r,. 
e>i 

Given any integer i, the idea is to decide to randomly switch on with proba- 
bility A(^) all the Fourier frequencies from —i to +i. Then, vr^ is a distribution 
defined on ^^(Z) such that vr^ := ^kez^^i and 

V/c G Z tt'I = l|fc|>,5o + l|fc|<,AAc(0, ej- 

The randomisation of selected frequencies is done using A, a probability 
distribution on N* which satisfies for p G (1,2): 

The prior vr depends on the variance of the Gaussian laws used to sample 
the Fourier coefficients. In the sequel, we use a variance that depends on n 
according to 

(2.6) :=n-^^(logn)-^, 

where fig and C are parameters that may depend on s (non adaptive prior) 
or not (adaptive prior). 
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Prior on g. As our model does not seem so far from a mixture Gaussian 
model, a natural prior on g is built according to a Dirichlet process following 
the ideas of [GvdVOl]. Given any finite base measure a that has a positive 
continuous density on [0, 1] w.r.t. the Lebesgue measure, the Dirichlet process 
Da generates a random probability measure g on [0, 1]. For any finite par- 
tition {Ai, . . . , Ak) of [0, 1], the probability vector {g{Ai), . . . , g{Ak)) on the 
A;- dimensional simplex has a Dirichlet distribution Dir{a{Ai), . . . ,a{Ak)). 
Such process may be built according to the Stick-Breaking construction (see 
for instance [Fer73]). 

2.4. Main result. Using the prior defined above, we obtain the following 
theorem on the randomly SIM. 



Theorem 2.2. Assume that £ Tis with s > 1, then the values fig 
2 /{2s + 2) and = in the definition o/^„ yield a non adaptive prior such 
that 

n„{Pj,g s.t. dH{Ff^g,Ffo^go) < Me„|yi,...y„} = i + Op^^^^ji) 

when n — t- -|-oo, for a sufficiently large constant M such that. Moreover, 
the contraction rate e„ is given by 

e„ = n-^/(2'*+2) logn. 

The values /i = 1/4 and C = 3/2 yield the contraction rate 

n„{Pj,3 s.t. dH{Ff^g,Ff0^yo) < Men\Yi,...Yn} = 1 + Op^o^^o(l) 

for a sufficiently large constant M , when n — > +oo with 

n^'^/^logn if s > 3. 

Let us briefly comment this result. It flrst describes the posterior concen- 
tration around some neighbourhood of the true law IP/o^gO within a polyno- 
mial rate. Our prior is adaptive with the regularity s as soon as s G [1,3] 
setting = n~-^/^(log For this range of s, the convergence rate is 
to a logarithmic term. To the best of our knowledge, the min- 
imax frequentist rate is unknown for the problem on recovering IP/o^gO when 
both f^ and g^ are unknown. An interpretation of such polynomial rate is 
rather difficult to provide. It may be interpreted as —s/{2s + d) where d 
is the number of dimension to estimate in the model {f^ and g^). When s 
becomes larger than 3, the rate of Theorem (2.2) is "blocked" to 3/8 (which 
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corresponds to s/(2s + 2) when s = 3) and does not match with s/{2s + 2). 
This difficulty is mainly due to the important condition w'^ < 1^ in Theorem 
3.1. 

At last, the non adaptive prior based on = n"^/^^*"'"^) recovers the good 
rate —s/{2s + 2) for all s larger than 1. 

The former result establish a result on the law Fj^g € V. It is also possible 
to derive a second result on the objects / G Tig themselves. This result is 
studied in [BG13] and provides a somewhat quite weak result on the posterior 
convergence towards the true objects and g^. 

3. Metric description of the model. We aim to check conditions 
(2.4) and (2.5) and then apply Theorem 2.1. In this view, we first define in 
section 3.1 a sieve Vi^^w^, and our goal is to find some optimal calibration 
of e, and with respect to n. We thus need to find a lower bound of 
the prior mass around some Kullback-Leibler neighbourhood of P/o^gO G V. 
These sets are defined as 

Ve„(P/o,gO,a!i^L) = G V\dKL{rf0^gO,Ff^g) < el,V{Ff0^gO,FfJ < el] . 

This will be done indeed considering Hellinger neighbourhoods instead of 
Kullback-Leibler ones. A link between these two kinds of neighbourhood is 
given in section 3.2. In section 3.3, we work with the Hellinger neighbour- 
hoods to exhibit some admissible sizes for e^, in and Wn- At last, we prove 
Theorem 2.2 in section 4.1. 

In all this section, we delay most technical proofs to the Appendix. 

3.1. Entropy estimates. We first establish some useful results on the com- 
plexity of our model Ff^g when / G Us and g G 9?t([0, 1]) in various situations 
(/ known, unknown, parametric or not). 

3.1.1. Case of known f . We first give some useful results when / is known 
and belongs to a finite dimensional vector space (the number of active Fourier 
coefficients is restricted to for a given i). Then i will be allowed to 

grow with n and depend on a parameter e introduced below. Hence, / is 
described by the parameter 9 = {9-e, . . . ,6q, . . . , 6i), and we define the set 
of all possible Gaussian measures 

:= {je,v,y^e [0,1]}. 

Following the arguments of [GWOO], it is possible to establish the following 
preliminary result. 
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Proposition 3.1. For any sequence 9 G C^^+^, one has 



^' e 
where o(l) goes to zero independently on I and 9 as e — )■ 0, and 

logN{e,Ae,dH) <logi + log\\e\\n, + log^. 

Assume now that g possesses a finite number of k points in its support, 
one can deduce from the proposition above a simple corollary that exploits 
the complexity of the simplex of dimension k — 1 (see for instance the proof 
of Lemma 2 in [GWOO]). 

Proposition 3.2. Assume that f is parametric and known (9 G C^^~'"^J 
and define 

{k k ^ 

J2g{^ihe,^, ■ fi G [0,1], giVi) > 0,Vi G 11, kj and^gi^i) = 1 \ 
i=l i=l ) 

for a number of components k that may depend on e (as i does). Then 

Hy^ieMldu) < k j^log^ + logll^ll^, +log^^ . 

We then naturally provide a description of the situation when / is known 
and parametrized by an infinite sequence 9 £ ^^(Z). According to the previ- 
ous computations, and using a truncation argument at frequency = e~^/* 
in the Sobolev space Tig, one can show the following result. 

Corollary 1. Assume f e Us known for s > 1 (9 := 9{f) such that 
X^jezl^iPbP* ^ using the same set Ae as in Proposition 3.1 with 



= e"^/**, then 



s + 1 1 

H[]{e,Ae,dH) < log - + log 

" s e 



Similarly, one also has 



(e, MldH)<k{ '-^ log ^ + log \\9\\n, ] . 
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The next step is to consider a continuous mixture for g, which is the more 
natural case. For / known, let 



Once again, we will only consider functions / with null Fourier coefficients 
of order higher than i^. For sake of simplicity, we will omit the dependence 
on e with the notation i. 

It would be quite tempting to use the results of [GvdVOl] to bound the 
bracketing entropy of Vf, but indeed as pointed by [MMll] applying directly 
the bounds obtained in Lemma 3.1 and Lemma 3.2 of [GvdVOl] to our setting 
yields a too weak result: the size of the upper bound on H\^{e,V f ^dn) will 
have a too strong dependency on I. By the way, we have to carefully adapt 
the approach of [GvdVOl] to obtain a sufficiently sharp upper bound of the 
entropy oi Vf. Such bound is given in the next result, in which we provide 
a majorization of the entropy with respect to the Total Variation distance 
which is easier to handle here. Note that all the previous results are still 
true if we use cIh instead of (Itv since (A. 2) also permits to retrieve entropy 
bounds for dn from entropy bounds for dTV- 



Proposition 3.3. Let e> and s > 0, if log^ <i and f e is such 
that \\e\\^ <2i + l, then 



The second inequality opens the way for the case of unknown / given 
below. It is possible since in the first inequality we have carefully expressed 
the dependency on / and i. 

The method to build an e-covering of Vf follows two natural steps: 

• approximate any mixture g' by a finite one g such that 



with a number of components of the finite mixture g uniformly bounded 
in g (depending on / and e); 
• use Proposition 3.2 for the finite mixture to well approximate Pe.g- 

The proof itself is delayed to the Appendix. 



rf:={rf,g\gem{[o, i])}. 




If furthermore w < \/2£ + 1 then 



sup 




dTv{n,g,n,g) < e/2, 
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3.1.2. Case of unknown f. We now describe the picture when / is un- 
known, which is the main objective of this paper. We assume that / belongs 
to Tig- In order to bound the bracketing entropy, we define a sieve over T-Ls 
which depends on a frequency cut-off £ and a size parameter w. We then get 

re,u^ ■■= {lP/,5 \f^K mm <w,ge Tl{[0, 1])} . 

Theorem 3.1. Let be given e > small enough, and assume that and 
We are such that log ^ ^ h and We ^ then 

logN{e,Ve,,^^_,dTv) < ll (^log^ + log^ 

The proof of Theorem 3.1 is based on two simple results. The first one is 
the Girsanov formula obtained by [BGIO] in appendix A. 2. 2 (in the case of 
known g): it can be extended to the situation of unknown g and complex 
trajectories as in (2.1), which leads to 

(3 1) ^L^(Y)- /o e^p(23f^e(r"^rf>^)-llr"HP)^^^?(«l) 



V.s" jlexp{2^e{p^--\dY) - ||/0'-2||2)rf^0(c,2)' 

for any measurable trajectory Y . 

The second result is given in the following lemma. 

Lemma 3.1. Let f and f be any functions in L^([0, 1]), g be any shift 
distribution in 9K([0, 1]), then 

ciTv(F/,„P/,,)<^^. 

Proof of Theorem 3.1. The idea of the demonstration is to build a 
e-covering oiVi^w with e/2-coverings for / and g. First, let Pj^^ and - two 
elements of Vi^^ and remark that by the triangle inequality 

We will look for a covering method that will use the inequality above and 
a tensorial argument, it requires to bound both terms. The majorization of 
the first one comes from Lemma 3.1. The second term is handled uniformly 
in / by Proposition 3.3. 

Now, we build e/2-coverings of Pj g for fixed g from an e/-v/2-covering of 
/ for the L^-norm: 

logiV (e/v^, {/ G ni% \\9{f)\\ < We] , II • ll) < 41og^ = • 
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□ 

According to inequality (A. 2) and since log ^ < log j, we can easily deduce 
the next corollary. 

Corollary 2. Let be given e > small enough, and assume log 7 ^ 4 
and We < \/2£e + I, then 

logN{e,Vi^,^^,dH)<f, (log ^ + log 4) . 

Remark 3.1. i) Even if the model studied here is a very special case 
of Gaussian mixture models, one may think that such kind of results may 
help the analysis of more general mixture cases within a growing dimension 
setting. 

a) In our case, we will use a much higher choice of 1^ than log|. This 
choice will be fixed in section ^.1. 

3.2. Link between Kullhack-Leibler and Hellinger neighbourhoods. We first 
recall a useful result of Wong & Slien given as Theorem 5 in [WS95]. It en- 
ables to handle Hellinger neighbourhoods instead of V^^i^ f\gO,dKL)j which 
is generally easier for mixture models. 

Theorem 3.2 (Wong & Shen). Let fj, and v be two measures such that fi 
is a.c. with respect to v with a density q = dfi/dv. Assume that dnifJ-, i^)^ = 
J{y/q — Ij^du < and that there exists 6 G (0, 1] such that 

(3.2) M| := / q^^^di^ < 00. 

Jq>e^/S 

Then, for e small enough, there exists a universal constant C large enough 
such that 

dKL{lJ',J^) = J q\ogqdv < (7 log(M5)e^ log ^, 



and 

1 2 

\2,2 



V{fi,u) < j qlog^qdu < Clog{Msfe^ 



log- 
e 



Hence, Hellinger neighbourhoods are almost Kullback-Leibler ones (up to 
some logarithm terms) provided that a sufficiently large moment exists for q 
{q log q is killed by q^~^^ for large values of q and a second order expansion of 
q log q — q+1 around 1 yields a term similar to [y^ — 1]^). Next proposition 
shows that condition (3.2) is satisfied in our SIM. 
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Proposition 3.4. For any P/o^^o G V, and for any f £ Tig such that 

< 2||/'^||, and any g G 9JT([0, 1]), define q = . There exists 

6 G (0, 1] such that the constant defined in equation (3.2) is uniformly 
bounded with respect to f . 

3.3. Hellinger neighbourhoods. Proposition 3.4 will allow to use Theorem 
3.2, thus we now aim to find a lower bound on Hellinger neighbourhood 
of Pjo gO. Consider a frequency cut-off in that will be fixed later. For any 
/ G fil'' and g G 9Jt([0, 1]), remind that we denote 9 := 9{f) as well as 
61° = 6'(/°). We define /° the projection of /° on the subspace Til". 

For sake of simplicity, E,qF{Y) will refer to the expectation of a function 
F of the trajectory Y when Y follows Pjo ^o. The triangle inequality applied 
to the Hellinger distance shows that 

{El) {E2) (Es) 



In the sequel, we will provide sufficiently sharp upper bound on (Ei), {E2), 
(E^) so that we will be able to find a suitable lower bound of the prior mass 
of Hellinger neighbourhoods. 

Upper bound of (Ei). We first bound (Ei) using d'jj < dj^i with the Gir- 
sanov formula (3.1) 



(El) < ^dKL{^f0^gO,F^o^^go) 
-log 




/Jexp (2jRe(/°;-°,dy) - ||/°Jp) dg'ja) ^ ' 
/;exp(25Re(/o.-°,dy) - m^)dg\a) 



We now obtain the upper bound of {E\) according to the next proposition. 

Proposition 3.5. Assume that Y ~ P/o,gO and /° G Tig, then 

(E,) < (A) < V2II/0 - flW < V2\\f''\\nii-'. 

Upper bound of {E3). We will be interested in the Hellinger distance when 
f^^ is close to /, and the dimension £n grows up to +00 (the mixture law 
on [0, 1] is the same for the two laws). The important fact will be its ex- 
clusive dependence with respect to the distance between f9 and /. This 
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upper bound is given in the next proposition, whose proof is immediate from 
Lemma 3.1 and equation (A. 2). 

Proposition 3.6. Assume that f G Hl"- and g G 9Jt([0, 1]), then 
dHi^fl,g,^f,s)<2'/'^\\f-fl\\. 

Upper bound for (i?2)- This term is clearly the more difficult to handle. We 
will obtain a convenient result using some elements obtained in Proposition 
3.3. For a given e„ > 0, 4, f^^ £ Til" and G Tl{[0, 1]), we know that one 
may find a mixture model g such that dni^fO „o,Pj-o x) < e„ and g has 

C£^ points of support in [0, 1] as soon as is small enough and log ^ ^ 
(the condition < + 1 is immediate since does not depend on 

n). The next step is to control the Hellinger distance dr/fPro „,PfO x) for 
g G 9Jt([0, 1]), and this can be done thanks to an adaptation in dimension 
2in + 1 of Lemma 5.1 of [GvdVOl]. 

Lemma 3.2. Let be given g a discrete mixture law whose support is of 
cardinal J whose support points {^j)j=i...j are such that g{<fj) = Pj and 
rj-separated, i.e. {ipj — Lpi\ > r],\/i ^ j, then \/g G 5[)T([0, 1]) 

dli^flra^^fl,^) < JlWfiJn.V + 2Y,m^j - W2, + r]/2]) - ~g{^,)\ . 

j=i 

In [BG13], we will show that it is possible to obtain a more general upper 
bound for the Hellinger distance between P^o p, and Pj-o „ which implies the 
Wasserstein distance Wi{g,g) between g and g, but such upper bound is a 
little bit less powerful than the one given by the former lemma. Note that 
Lemma 3.2 needs a discrete mixture with r/-separated support points. The 
following result permits to obtain such a mixture. 

Proposition 3.7. Assume that /° G for s > I, g^ £ 93T([0, 1]), and 
log ^ ^ f-n- For any rjn < e^, there exists a discrete distribution g with in its 
support at most Jn < points denoted (^ipj)j^i j^, such that these points 
are r]n-separated, and 
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Furthermore, for any g G 5!Jt([0, 1]), 



+ 




In [BG13] we obtain a more general upper bound for (-£'2)1 based on the 
Wasserstein distance. We could use it to retrieve Proposition 3.7, but it also 
leads to Hellinger neighbourhoods described in terms of the Total Variation 
distance from g to g^ . This last distance is adapted to smooth densities g 
but not to the ones considered here, when the prior distribution for g \s a. 
Dirichlet process. 

Description of a Hellinger neighbourhood. We can now gather the upper 
bounds of {Ei), {E2), and (£^3) to get the following result. 

Proposition 3.8. Assume that /° € V-s for s > 1 and g° G m{[0, 1]). 
Choose the threshold such as en^^'^ ^ ^ ^n^^'^ and rjn '■= e^, and consider 
the finite mixture g provided by Proposition 3. 7. Define 



Then, there exists a constant Cq depending only on such that for any 



4. Proof of Theorem 2.2. We will prove this result using the "tool- 
box" provided by Theorem 2.1. We thus check its applicability and consider 
each of its hypotheses. 

4.1. Checking the conditions of Theorem 2.1. We first prove the minora- 
tion for the lower bound (2.5), necessary to apply Theorem 2.1. 

Proposition 4.1. Assume that /° e TLs for s > I and g^ G SUt([0, 1]). 
For any sequence (e„)„gN which converges to as n ^ +00, and for the 
prior defined in paragraph 2.3, there exists a constant c > such that 




g G Qer. and / G J", 
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where 

h — k'/'{iog(i/6„))''+2/-v^;r2i 

Proposition 4.1 relies on Theorem 3.2, which permits to use Helhnger 
neighbourhoods instead of Ve„(Pjo^gO, di^i), and on Proposition 3.8, which 
describes suitable Hellinger neighbourhoods. To control their prior mass, we 
remind the following useful result appeared as Lemma 6.1 of [GGvdVOO]. 
This enables to find a lower bound of £i-ball of radius r under Dirichlet 
prior. 

Lemma 4.1 ([GGvdVOO]). Let r > and {Xi, . . . ,Xn) be distributed 
according to the Dirichlet distribution on the ii simplex of dimension N — 1 
with parameters (m, ai, . . . , oat). Assume that aj = m and Ar^ < ctj < 1 
for some constants A and h. Let (xi, . . . , xtv) be any points on the N simplex, 
there exists c and C that only depend on A and b such that if r < 1/N 

Xj\ <2r \ > Cexp ( — cA^log- 

In the proof of Proposition 4.1 (delayed to the Appendix), one can see 
that we could obtain a suitable lower bound as soon as X{in) > 
for a constant c. Of course, a distribution A with some heavier tail would 
also suit here. However, such a heavier tail is not suitable for the control of 
the term n„ (V \ Vn) which is detailed in the next proposition. 

Proposition 4.2. For any sequences kn i— s- +oo and e„ i— t- as n i— )• 
+00, define = Ak^ + 2, then there exists a constant c such that 

n„ {V \ Pfc„,^J < e-^I^-"i°s''(^")^'="«"'l, 

and 

logD{en,Vk„,w„,dH) < kl 




log kn + log — 
en 



We are now able to conclude the proof of the posterior consistency. 

Proof of Theorem (2.2). Take e„, := n~°(logn)'* and kn := n'^(log n)'''. 
From our definition (2.6), we have also = n'^'= (log n)^, and we look for 
admissible values of a, /3, k, 7, /z^, and Q in order to satisfy (2.3), (2.4) and 
(2.5). 
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Proposition (4.1) imposes that in order to satisfy (2.5), we could check 
that 

(^og—j Vn^^(logn)^ < nel = n^-^'^ilognf^. 

This is true as soon as 6^ satisfies 

a< — - — and k > {ps + 2)/{2s + 2). 
2s + 2 

Moreover, we obtain the first condition on ps- f^s ^ 1 — 2q, and if = 1 — 2a 
then C < 2k. 

Now, Proposition (4.2) shows that (2.3) is fulfilled provided that 



(4.1) k 



log kn + log — 



<ne2 =ni-2"(logn)2-. 



This condition is satisfied when 2/3 < 1 — 2a and 27 + 1 < 2k. At last, 
Proposition (4.2) again ensures that (2.4) is true as soon as 

kl logP kn A /c„n^= > nel 

and we deduce from (4.1) that 

2/3 = 1 -2a and - p/2 + k < 7 < -1/2 + k. 

Moreover, we also see that (3 + fig > 1 — 2a, hence fig > 1/2 — a, and if 
fig = 1/2 — a then 7 + C ^ 2k; the former condition on ps yields ps > 
1/2 — a > 2i+2 ('^hich naturally drives us to set ps = 1/4 (case s = 1) for 
adaptive prior). 

We split the proof according to the adaptive or non adaptive case. 

Adaptive prior. We first set fx independent of s and equal to 1/4. For any 
s S [1,3], we see that a(s) = s/{2s + 2) is the admissible largest value of 
a and a(s) = 3/8 < s/{2s + 2) as soon as s > 3. The corresponding value 
of (3 is l/(2s + 2) when s € [1,3] and /3 = 1/8 otherwise. Any choice of 
C G [3/2,2) permits to deal with the conditions on C, that appears when 
s = 1 or s > 3. The other values of 7 and k may be determined with respect 
to p. For instance, if we choose p G (1,2), we can take k = 1 and 7 = 1/2. 

Non adaptive prior. The non adaptive case is much more simpler since it 
is sufficient to fix 

//^ = l-2a = 2/(2s + 2) 

and = to obtain suitable calibrations for a, /?, k and 7. This achieves the 
proof. □ 
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5. Concluding remarks. In this paper, we exhibit a suitable prior 
which enable to obtain a contraction rate of the posterior distribution near 
the true underlying distribution Pjo^o. Moreover, this rate is polynomial 
with the number n of observations, even if our SIM is an inverse problem 
with unknown operator of translation which depends on g. From a technical 
point of view, the keystones of such results are the tight link between the 
white noise model and the Fourier expansion as well as the smoothness of 
Gaussian law which permits to obtain an efficient covering strategy. 

A natural problem would study of the behaviour of the posterior distri- 
bution regarding the functional objects shape and mixture law g^. This 
question is tackled in [BG13] where we establish a contraction of the posterior 
distribution around and g^ up to identifiability conditions. 

Another interesting extension would consider the SIM with a noise level a 
depending on n in the Bayesian framework. This asymptotic setting is linked 
to the work of [BG12] in which their J curves are sampled at the n points 
of a discrete design in [0, 1]. 

At last, an open and challenging question concerns the research of stochas- 
tic algorithm to approach the posterior distribution in our non parametric 
Shape Invariant Model. One may think of an adaptation of the SA-EM strat- 
egy proposed in [AKTIO] even if this approach is at the moment valid only 
in a parametric setting. 

APPENDIX A: TOPOLOGY ON PROBABILITY SPACE 

Probability distances. We study consistency using standard distance over 
probability measures. If P and Q are two probability measures over a set X, 
absolutely continuous with respect to a reference measure A, dn refers to the 
Hellinger distance defined as 



dH{P,Q) := 

Note that dn does not depend on the choice of the dominating measure 
A, and that the definition can be extended to any finite measures P and Q 
in a straightforward way. 

When needed, we use the Total Variation distance between two probability 
measures P and Q.li B is the u- algebra of measurable sets with the reference 
measure A, this distance is given by 

dTv{P,Q) := sup \P{A) - Q{A)\ = \ f 




dP dQ 
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At last, we recall the definition of the KuUback-Leibler divergence (entropy) 
between P and Q since it is sometimes be used in the work: 

dKL{P,Q) := J^-log^dP. 

In the sequel, we shall also use V{P,Q) defined as a second order moment 
associated to the KuUback-Leibler divergence 

mQ):=/^(log§)^P. 

It may be reminded the classical Pinsker's inequality 



(A.l) 




dKL{P,Q) > dTv{P,Q), 



as well as 

(A.2) i dniP, Qf < dTviP, Q) < dniP, Q). 

Model Complexity. To obtain the posterior consistency and convergence 
rate, we shall use results given by Theorem 2.1 of [GGvdVOO] which is stated 
below. This theorem exploits the notion of complexity of the studied model, 
and this complexity is traduced according to packing or covering numbers. 
For any set of probability measures V endowed with a metric d, D(€,V,d) 
refers to the e-packing number (the maximum number of points in V such 
that the minimal distance between each pair is larger than e) . The e-covering 
number A^(e, V, d) is the minimum number of balls of radius e needed to cover 
V. These two numbers are linked through the following inequality 

N{e,V,d) < D{e,V,d) < N{e/2,V,d). 

At last, for d a metric on finite measures, an e-bracket is a set of the form 

( dL dP dU] 

for L and U two finite measures such that d{L, U) < e and A any dominating 
measure. The e-bracketing number N\^{e,V ,d) is the minimal number of e- 
brackets needed to cover V. Note that iVp (e, du) is an upper bound of the 
(e/2)-covering number N(e/2,'P, dfj)- The bracketing entropy is then defined 
by F[](e,P,d) :=logAr[](e,P,d). 
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APPENDIX B: TOOLS FOR THE PROOF OF THEOREM 2.2 
B.l. Entropy estimates. 

Proof of Proposition 3.1. The proof is similar to Lemma 1 of [GWOO], 
we set p = 2i+l and for any e > 0, we are going to build an explicit bracket- 
ing of Ag and then bound A^p (e, Ag, d^)- For an integer K which will be cho- 
sen in the sequel, we define [(p^_, (p\] of size = 1/K, with ip[^ = {i — l)^</3 
and (p^j^ = lAip. For any 5 > 0, we consider the lower and upper brackets 



(1 + ^) ,(1 



(/3L,(l+5)-"M 



and 



Ui := (1 + 5)jg, 



We are looking for some admissible values of a, 5, and K such that the set 
i[h,Ui])i=i...K is an e-bracket oi Ag for dfj- Of course, for all if G [ipl^,ip'\_], 
h < le»ip,ld{-) ^ Ui should hold, but we can check that Vx G C, 



k{x) 



< 



1 



1 



||e.y;-fl.y>. 11^ ^ 



le.vMx) - 1 + <5 (1 + 6yp^ 
Hence, we must have a < 1/p, and we must also satisfy 

I — pa ,^ ,^ , a(l — pa)(5^ 



< 



:i-(i+5)-")iog(i+5) 



where o(l) does not depend on p and goes to zero as 5 — )■ uniformly in a in 
any positive neighbourhood of zero. In a same way considering "fg,ipjdU~^, 
we obtain 



Vx G C 



7e»ip,idix) 

Ui{x) 



and the same conditions arise. In order to minimize the cardinal of the brack- 
eting, A^p must be as large as possible, we then maximize a(l — pa) and 
choose a = (2p)~^. 

We must now check that dH{li,Ui) < e. Rapid computations show that 



duik^Uif = 5^ + dni^g. 



ipL,(l+<5)-"M 



(•),79., 



ipt,(l+S)°'Id 



Using standard formula on Hellinger distance for multivariate gaussian laws, 
we obtain 



dH{k,Ui 



6^ + 2 



6^ + 2 



1 



2P 



2PVTTS 



(1 + (1 + 5)1/ 
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One can easily check that, whatever p > 1, (l + (1 + 6)^/Py < 2^6^/^, which 
yields 

for 5 small enough. An admissible choice of 5 should he 5 = e/-v/2, which 
insures d}{{li,Ui) < e. We then obtain 



167r2p||0||2^^ 327r2p||0||2^/ 



where o(e^) does not depend on p. The number of brackets is now K = A^^^, 
this ends the proof of the proposition. □ 

Proof of Proposition 3.3. We first fix the notation p = 2^ + l which 
refers to the dimension of the multivariate mixture. For any i? > which will 
be chosen later, En is the ball of in of radius R. For sake of simplicity, we 
will sometimes omit the dependence on e with the notation p. According to 
the hypotheses in Proposition 3.3, there exists an absolute constant a such 
that ll^ll < w < Oy^. We first write 

dTvi^e,g,n,g) < \ I We,g " dFe~g\ [z)+\ [ \d¥e,g - dFe.g\ (z) . 

-—i!. ^ ' ^ V ' 

■MA) -MB) 

Let 1/ he a measure on [0, 1] that dominates both g and g. 

Term (A). We will pick R such that (A) is smaller than e/2, first set R^ > 
(1 + a)^p > a~^(l + a)^||0|p and with this choice, 

V(/jG[0,1] VzG^^ > ||z||/(l + a). 

This simply implies that. 



1 dg. . d~g 

dv du 



{A) < TT~P e (1+^ 

Jsfi Jo 

<_ 2(l + „)*p(,i2^) 



du{ip)dz 



To deal with we last term we use a concentration of chi-square statistics 
inequality (see Lemma 1 of [IL06]): for any fc > 1 and c > 0, 

(B.l) V(xl > {l + c)k) < ^Le-|[--Mi+c)]-iiogfc_ 

cv27r 
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Therefore, writing i?^ = (1 + a)^(l + c)p for c > 0, one gets 



J_p-p[c-log(l+c)-21og{l+a)]-ilogp 



(A) < 

and this term is smaller than e/2 if we pick c large enough, since log \'^'P- 

Term (B). We then consider (B), following the strategy of [GvdVOl] which 
exploits the smoothness of Gaussian densities. We will exhibit a discrete 
mixture law which will be close to Pg.g, for any given g. Taylor's expansion 
theorem yields: 



(B.2) VA: G N Vy G 



k-l 



i-yy 



j=0 



- kl - ' 



--Rk{y) 



Thus, for all z €z Sr, we have 

Jo 



dg , . dg . 



1 



r- Jo 



|2i 



dg . , dg . 



dv[(p) 



+ / Rk{\\z-6.^f) 

Jo 



dg , . dg . 
av av 



We now decompose 6 



diy{ip). 

and z = {z-e, . . . ,ze) using polar coor- 



dinates: 6m = Prn'*e'"'" and Zm = Pm e^^"^ for \m\ < i. This leads to 



^||2 _ ||^||2 



+ ||6'|p-2 ^ p!^^p!^^ cos{l3m- am-mif). 



m=—£ 

For any integer j < k, we deduce that 



\z 



9 • LfW'^^ = Cj{z, ^) + X] ar,m.(^, 0) [cos(/3m -OLm- mLp)]'' , 



r=l m=—£ 



where {a{r,m))r=i...k,m=-e...e is a complex matrix which only depends on z 
and 6. Using Euler's identity. 



r=-j£ 
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where b stands for a complex vector obtained by the Binomial formula and 
coefficients ar^miz,0). Consequently, for all z £ £ji 



k-l \j /.I 



j=0 



j! Jo 



+ ^ br{z,ti)e 

r=-jl 



du{Lp) 



av dv 



du{ip) 



k-l 



{-ly 



j=0 



Cj{z,e)co{g - g) + ^ br(z,9)cr{g - g) 

r=-jl 



+ 7r-P / Rk{\\z-e.^f) 



dg f . dg , 
dv dv 



dv{ip). 



Caratheodory's theorem shows that one can find g with a finite support of 
size 2{k - 1)£ + 1 ~ 2U such that 

Vr G [-(A; - 1% {k - 1)£] c,(g) = Cr{g). 

For such finite mixture law we obtain \/z G C^, 

¥e,g{z) - ¥e-g{z) = ^"f C Rk {\\z - 9 . <^f) 

Jo 

and of course 



dg f ^ dg , . 
Tv^^'^-Tv^^^ 



dv{if), 



{B) < -K" 



RkAWz 



<27r~f sup Rki^z 
ze£'fl,i/5e(o,i) 



dg . , d~g . 

Tv^^^-Tv^^^ 



dv{ip) 



dz 



According to the choice R = (l+a) (1 + c)p which implies that H^ — ^"(/jII < 
(1 + 2a) (1 + c)p, and using the volume of Er and Stirling's formula, we 
obtain 



iB) < 



vr 



(e(l + 2a)\l + c)p)' 7rP[(l + a)^(l + c)p]P 
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where we used in the last equation /p\ < C^. If we define the threshold 
k in (B.2) such that k ^ bi for a sufficiently large b, we then obtain for a 
universal C: 



(B) 



\clFf) 



£r 



In order to bound (B) by e/2, we thus choose k^ ~ b£^ for a sufficiently large 
absolute constant b. For such a choice, since log 7 ^ we have found g with 
a discrete support of cardinal ~ 2b£'^ points, with not depending on (7, 
such that 

ciTy(P/,„P/,g)<e/2. 

Now, the first inequality in Proposition 3.3 comes from Proposition 3.2. 

The second inequality in Proposition 3.3 is proved from the first one, using 
the relation ||6l||^j < ^||6I|| valid for any / G . □ 

Proof of Lemma 3.1. We follow a straightforward argument: Fj^g is a 
mixture model so 



7,9 



^0 



Thus 



dTv{rf,g, 



f,9 



< 



dg{a 



TV 



TV 



TV 

<dH 



Assume now Y ~ P/,5o, hence from (2.1) dY = f{x)dx + dW, with W is 
a complex standard Brownian motion. If we denote U a random variable 
A/'c(0, 1), standard argument using Girsanov's formula yields 



2 1 -E 



dF 



2(1- Kf^sjexp (2^e{f - f, dW) -\\f- /P 

-wf-fr 




exp(^||/-/Pe(C/) 
II/- /f 
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□ 



B.2. Link between KuUback-Leibler and Hellinger neighbour- 
hoods. 

Proof of Proposition 3.4. This proposition uses a corollary of Rice's 
formula (see [AW09] for various applications of such formula), stated in 
Lemma B.l and postponed after this proof. 

We begin with Girsanov's formula (3.1). Write now Y = f^'~'^ + W where 
W stands for a complex standard Brownian motion independent of the ran- 
dom shift r (whose law is g'^). The norm is invariant with any shift thus 



where the last inequality is obtained using Cauchy-Schwarz's inequality and 
the notations 



We now set S £ (0, 1] (it will be precisely fixed in the sequel) and we define 
the trajectories Eg as 






Hence, following the definition of M| of (3.2), we have 
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For 6 small enough, {6 < 



2{||/ll + il/»ll)"^- 

Ms < e 11^ Ee ^ 1 '>lz,+z,>l-{\\f\\+\\n\r 

< eWII+ll/°ll)^Ee^(^i+^2)i 

Integrating by parts the last expectation, the use of Lemma B.l yields 

r+oo 

i 



Zi logn ' 
2 - 45 ' 



^ ^ logtt 
2-4(5 



du 



(B.3) Mi<Cif,f)e 



fv^dl/ll+ll/^ll)" 



+ 00 



16i2||/0||2 _|_ g 16i2||/||2 



Now, we can choose 5 non negative and small enough such that M| < oo 
since for u > ^/e, we have 



log (u) _ log(n) 



1/3252||/0|| 



u 

which is an integrable function as soon as 5^ < 32j|y(T|p) s-iid the same holds 
with / instead of /q. Note that M| is uniformly bounded if / is picked into 
a ball centered at with radius 2||/'^||. □ 

We now show that the technical inequality used in (B.3) is satisfied. 

Lemma B.l. Let W a complex standard Brownian motion and u a com- 
plex 1-periodic map of T-Lg. We assume that u is of class C^. Then when 
t/\W\\ — > +00, we have 



supKe(u-",(iVF) > t 
In particular, if u ^ Til , we have 

supKe(n-",dTy) >t \ < 



27r||n| 



exp 



-t' 



\u\\ 



2ir 



exp 



u 



Proof. We define the following process 

^/23f?e ( /d u{s - a)dWs 
Va e [0, 1] X{a) := ^ — 
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X is a Gaussian centered process. Its covariance function is given by 

Tit)=K[XiO)X{t)]. 

Obviously, one has r(0) = 1 and Cauchy-Schwarz's inequality implies that 
r(s) < r(0). Moreover, since F is C^([0, 1]), we deduce that r'(0) = and 
simple computation yields 

( fn u' (s)u'" (s)ds] |U,'||2 

r"(o) = ^ ° „ „, ^ = -1%. 

IfII IfII 

Rice's formula (see for instance exercice 4.2, chapter 4 of [AW09]) then yields 
that when t — > +oo, we have 

supX(a) > A ~ -^i^^ie"*'/^. 
Y J 27r||n|| 

This ends the proof of the first inequality. Assume furthermore that u € 
Til, Parseval's equality implies that \\u'\\ < and we obtain the second 

inequality. □ 



B.3. Hellinger neighbourhoods. 



Proof of Proposition 3.5. RecaU first that if Y follows ^f\go, one 
shift /? is randomly sampled according to g^. Conditionally to this shift /3, Y 
is described trough a white noise model dY{x) = f^{x — f5)dx + dW{x). For 
any function F of the trajectory Y , we will denote E^F(y) the expectation 
of F{Y) up to the condition that the shift is equal to /3, and of course one 
has ^ 

Eo[F(y)] = / ¥.p[F{Y)]dg\p). 
Jo 

For each possible value of /3 G [0, 1], we define 

D^{a) := exp (2Ke(/°-", /O'"^) + 2Ke(/°-", rft^) - \\flf) , 
Xp{a) := exp (2Ke((/0 - /°'-^) 

+ 2Ke((/0 - dW) - \\f - fl f ) . 

We can now split the randomness of the Brownian motion into two parts: 
the first one is spanned by the Fourier frequencies from —in to and the 
second part is its orthogonal (in L^): W = W\ + W2- Of course, W\ and 
are independent. 
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Moreover, dT^) = (Z,":"", dt^i) and {{f - flr-,dW) = {{f - 

f^^)~'^,dW2)- For any fixed /3, Dp{a) is measurable with respect to tlie fil- 
tration associated to and Xp{a) is independent of W\. We thus obtain 
using Jensen's inequality and this filtration property that 



{El 



E 



log 



jlDp{a)Xp{a)dg\a) 



< log 



< log 



E 



W2 



E 



Wi 



j^Dp{a)Xp{a)dg\a) 



E 



W2 



Xp{a)E 



!iDp{a)dg^{a) 
JiDp{a)dg^{a) 



^lDp{a)dg^{a) 
<log^ (supEf [X;3(a)])(igO(/3). 



W2 
W2 



dg\P) 
dg\p) 



The notation E^^FiY) (resp. E^'^ F{Y)) used above refers to the expectation 
of F(Y) with respect to Wi (resp. with respect to W2) with a fixed (3. 
Now, one should remark that Xp[a) has the same law as 



exp (25Re((/0-/Oj-°,/0'"^) + C/j 
where U ^ - fl\\\2\\f - /^Jp), and E [e^] = 1. Hence 



{Elf < log /'supexp(23f?e((/0-/Oj-",/0.-/'^ 

Jo a ^ 

< logsupexp (2Ke((/0 - /,°)-", /^'"^^ 



dg^{f3) 



We can now switch log and sup since log is increasing, and we obtain 



(^i)< j2supKe((/0-/0 )--,/o-/3). 

Again, we can use the orthogonal decomposition f^~^ = fi^"^+f'^'~^ — fi^[~^ 
and Cauchy-Schwarz's inequality yields (£^1) < y^H/*^ ~ fe II- 

Note that untill now we did not use the hypothesis G Us- It is only 
needed to get the last inequality in Proposition 3.5. □ 



To establish Lemma 3.2, we first remind the following useful result. 
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Lemma B.2. For any any dimensionp and any couple of points (zi, Z2) S 
C^, i/ ||zi — Z2II is the Euclidean distance in C^, then one has 

dTv{lzi,7z2) = ^hzi -722IIL1 = 

where ^ stands for the cumulative distribution function of a real standard 
Gaussian variable. 



2$ 



\Zl - Z2\ 



< 



\Zl - Z2\ 



2tt 



Proof of Lemma 3.2. Adapting the proof of Lemma 5.1 of [GvdVOl], 

J 

+ 2^^ - ipj + 7?/2]) -p,\. 

Using Lemma B.2 ends the proof. □ 



Proof of Proposition 3.7. The construction used in the proof of 
Proposition 3.3 provide a mixture g such that g is supported by J„ := C^^ 
points (denoted j ) so that dni^fO gO,^fO =) < e^. Therefore 

g = J2j=i '^j^'Pj- pointed by [GvdVOl], one can slightly modify g so that 

the support points are separated enough as follows. First, denote 

the subset of {(pj)^^^ j which is r/„-separated with a maximal number 

of elements. Hence, J„ < J„ and up to a permutation, one can divide 
(Vj)j=i..X ™ two parts: (<^i)j=i...j„ = (.'^j)j=i....h.^ {^j)j=j^+i..J„- For any 
i G {Jn + !;•••; Jn}, we define as the closest point of ['ipj)j^ij^, the 
new discrete mixture law is then given by 

i=l \ i>Jn\j{i)=j J 
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Of course, ^ as a support which is j^^-separated. Moreover, we have 



J n 



i=l i=l 



i>Jn 



dz 



3=1 i>Jn\j(i)=j 



Then, Fubini's theorem yields 

j='^i>Jn\j{i)=j 



dz. 



and we deduce from Lemma B.2 that 
' fO o,irro ^ ^ 



drv (Ffo --] < V2^Y Y ^MlniVn < V2^\\e\\n,Vn- 

3 = 1 i>Jn\j{i)=j 



Now the relations between Hellinger and Total Variation distances (A. 2) 
yield 

dniF^o ,,o,P;o <en + dniF^o ,^,P^„ j) < (l + (8vr)V4||0|| V2) 
Lemma 3.2 permits to conclude. □ 
B.4. Checking the conditions of Theorem 2.1. 

Proof of Propostion 4.1. We have seen in the proof of Proposition 
3.4 that M| is uniformly bounded with respect to ||/|| and ||/''|| for a suitable 
choice of 6. We restrict our study to the elements / such that ||/|| < 2||/'^||. 

We know from Proposition 3.4 and Theorem 3.2 that as soon as e„ log J- < 
ce„ with c small enough: 

V,„iFfo^gO,dH) := {P/,3GP|d/^(P/o,,o,P/J<e„and||/||<2||/0||} 
C V,„{Ffo^gO,dKL). 
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This last condition on In is true as soon as 

(B.4) €n.= Cen {log 

with c small enough. Now, Proposition 3.8 permits to describe a subset of 
Vg„(Pjo gO, dfj), by the definition of subsets and Q^^ for / and g. Choose 

We first bound the prior mass on Qi^. This follows from the lower bound 
given by Lemma 4.1. The prior for g is a Dirichlet process with a finite base 
measure a admitting a continuous positive density on [0, 1] . Since r/^ goes 
to zero, for n large enough a{'il:j — r/„/2,'0j + r/n/2) for any j = 1 . . . J„. 

Note that Jn ^'ii = ^n'^^ < e^^- Thus, there exists an absolute constant 
a S (0, 1] such that the condition J„ < 2(ae„)^^ is fulfilled, and one can find 
universal constants C and c such that for n large enough 

(B.5) n„ (g,J > n„ (g„,J > Ce-^'-'^'^k > Ce-^-'^^^. 

We next consider the prior mass on Remark that when n is large 
enough, any element of J-g^ satisfies ||/|| < 2||/''|| and the additional con- 
dition on 11/11 in the definition of Ve„(Pjo ^o, (i//) is instantaneously fulfilled. 
Remark that from the construction of our prior on /, one has 

n„(J-,J>A(4)x7r,„ (S (Cet)). 

Prom our assumption on the prior A, we have A(£„) > e~'^" ^" , and the 
value of the volume of the (4^„ + 2)-dimensional Euclidean ball of radius 
implies 

Por n large enough we get 

(J-gJ >exp- {cll\ognn-ri-^ 

+(2£„ + 1) (log^ + 41og(l/e„) - logC' + 
(B.6) >exp[-(c + o(l)) [^^logP^^v^-^]] 
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Gathering (B.5) and (B.6), the relations = e„ and (B.4) lead to 

>exp[-(c + o(l)) [lllogPinyC-^]] 

> exp [-(c + 0(1)) [6-2/^ log'' (l/e„) V ^^^J" 

> exp [-(c + 0(1)) [e;2/^ (log(l/e„))^+2/. ^ ^ 

for constants c > 0. 



□ 



Proof of Proposition 4.2. The upper bound on the packing number 
comes directly from Theorem 3.1 since we set Wn = V^^n + 1- 

Now, to control the prior mass outside the sieve, remark first that owing 
to the construction of our prior, we have 



(B.7) 



n„(P\Pfc„,^J< Yl A(fe)+Pr Yl l^'^'l'^ 



|fc|>fcn 



Jfe|<fc„ 



where each 9k for —kn < k < kn follows a centered Gaussian law of variance 
Now, there exists some constants c and C such that for sufficiently large 

n: 

Y A(A:) < C7A(A;„) < e-'='=''°s''('="). 
|fe|>fc„ 

Regarding now the second term of the upper bound in (B.7), we use (B.l) 
to get 



|fc|<fc„ 



\|fc|<fcn 

<lP(xk+i>2(2A:„ + l)C') 
1 



< 



(2fc„+l)[e-^-l-log^-i-log(2fc„+l)/2 



Now, using the value of we obtain 

This concludes the proof of the Proposition. 



□ 
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